posted 12-13-2007 02:36 PM
OK, here goes.Here is a link to a comparison of OSS-3 results with the training sample - with and without Bonferonni correction to the alpha.
Bonferonni, as explained last night, is the correction to the mathematically inevitable inflation of alpha when completing multiple simulataneous significance test (spot scoring rule in ZCT exams, and MGQT scoring rules), and corrects for the increased likelihood of false positive results due to the addition-rule for deptendent probability events.
http://www.oss3.info/OSS-3_Training_sample_w_and_wo_Bonferonni.pdf
You can see their is a trade-off, with a significant reduction in FPs from 10.4% to 4.8% (p = .006) when we apply Bonferonni correction to the alpha of .05. The trade off of increased inconclusives, from 1.4% to 4.5% (p = <.001). 4.5% is still pretty low, and these figures pretty low so we're not concerned about the INC rate. FPs over 10% are a problem. Bonferonni works well. Also, these figures are with the training sample, which is a theoretical best fit, since we used this same data for bootstrap training of the OSS-3 model. We wouldn't expect to see these low INC rates generalize to other data or live field situations, but this is what he have with the study data.
Now, below is are the OSS-3 results with a 2nd validation sample of mixed formate cases from the DACA/DoDPI archive.
These are pre-2002 exams. This one-page document compares the efficiency of the Senter/2-stage rules to traditional MGQT/spot rules with the multi-facet exams.
http://www.oss3.info/OSS-3_mixed_format_validation_validation_sample.pdf
We've choosen to discuss the results in traditional terms of sensitivity (to deception) and specificity (to truthfulness), because these terms are easily understood by non-polygraph scientists. Sensitivity and specificity are essentially accuracy rates with inconclusives for truthful and deceptive subjects.
You can see the obvious advantages of the Senter/2-stage rules, with the mixed format cases. This difference would be more obscured if we chose to illustrate the data in terms of accuracy without inconclusives.
While the MGQT/spot rules provide good sensitivity to deception, they offer meager specificity compared to the healthy balance that the 2-stage rules achieve. This is consistent with Senter's work from 2003 and 2005.
Because the results of a diagnostic test become a basis for action (e.g., further investigation, prenatal care, medications, surgery, right-altering decisions, etc.), they must provide adequate specificity.
MGQT rules are a problem in mult-facet exams. Multi-facet exams are known-incident exams, and diagnostic tests must provide adequate specificity - to assure we can rule-out people who are not involved in or do not express the condition or issue of concern.
For this reason, OSS-3 uses 2-stage rules for multi-facet exams.
So, Senter rules prevail with both single issue and multi-facet event specific exams. However, screening exams are often mixed issues - what do we do about that.
We have decision rules specifically for screening exams - using an omnibus nonparametric ANOVA (KW-test) that is optimized for sensitivity. Omnibus statistics are simply formulae intended to test everything at once - they eliminate the effect of the addition rule on inflated alpha and increased errors and INCs.
Here is a link to a validation study of OSS-3 using a sample of LEPET screening exams from Texas.
http://www.oss3.info/OSS-3_screening_validation.pdf
Again you can see the inadequate specificity of the spot rules once again. You can also see the reduction in INCs, though we would caution against expecting INCs that low in live field situations.
Because screening test results are used differently than the results of diagnostic tests, we also have different priorities for our accuracy considerations. In screening contexts, negative results mean we're done - further action ceases (e.g., meds, surgery, investigation, testing, whatever.) So what we really value is adequate sensitivity (to deception), with the understanding that we will tollerate some FPs and clean them up at later stages of our diagnostic work.
For that reason, is it both acceptable and common in screening contexts to not include a Bonferonni correction to the inflated alpha.
Here is a link to the result of the LEPET sample, using the sreening rules with and without Bonferonni correction.
http://www.oss3.info/OSS-3_LEPET_screening_w_and_wo_Bonferonni.pdf
Consistent with our theoretical expectations, and experience in other testing and research environment, you can see that our screening test is better, more accurate with fewer false negatives, without the use of Bonferonni. While the p-values are not statistically significant, it is still acceptable to so everything we can to optimize the test for its intended use.
Someone (Barry) should fact-slap the Sergeant1107 with the idea that when we are talking about at test whose accuracy is already well above chance, all improvements will be modest, even modest improvements are important (and sometimes hard to achieve).
OK, any questions?
r
------------------
"Gentlemen, you can't fight in here. This is the war room."
--(Stanley Kubrick/Peter Sellers - Dr. Strangelove, 1964)